A Searchable Compressed Edit-Sensitive Parsing
نویسندگان
چکیده
A searchable data structure for the edit-sensitive parsing (ESP) is proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+ε)n log n+ 4n+o(n) bits for any 0 < ε < 1 and the number n of variables in G. The time to count occurrences of P in S is in O( 1 ε (m log n+occc(logm log u)), whereas m = |P |, u = |S|, and occc is the number of occurrences of a maximal common subtree in ESPs of P and S. The efficiency of the proposed index is evaluated by the experiments conducted on several benchmarks complying with the other compressed indexes.
منابع مشابه
Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Methods for evaluating dependency parsing using attachment scores are highly sensitive to representational variation between dependency treebanks, making cross-experimental evaluation opaque. This paper develops a robust procedure for cross-experimental evaluation, based on deterministic unificationbased operations for harmonizing different representations and a refined notion of tree edit dist...
متن کاملOnline Pattern Matching for String Edit Distance with Moves
Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies betwee...
متن کاملBrowse searchable encryption schemes: Classification, methods and recent developments
With the advent of cloud computing, data owners tend to submit their data to cloud servers and allow users to access data when needed. However, outsourcing sensitive data will lead to privacy issues. Encrypting data before outsourcing solves privacy issues, but in this case, we will lose the ability to search the data. Searchable encryption (SE) schemes have been proposed to achieve this featur...
متن کاملsiEDM: an efficient string index and search algorithm for edit distance with moves
Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has...
متن کاملMRCSI: Compressing and Searching String Collections with Multiple References
Efficiently storing and searching collections of similar strings, such as large populations of genomes or long change histories of documents from Wikis, is a timely and challenging problem. Several recent proposals could drastically reduce space requirements by exploiting the similarity between strings in so-called referencebased compression. However, these indexes are usually not searchable an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1101.0080 شماره
صفحات -
تاریخ انتشار 2010